Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells427422
Missing cells (%)8.0%7.9%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Age has 88 (19.7%) missing values Age has 83 (18.6%) missing values Missing
Cabin has 338 (75.8%) missing values Cabin has 338 (75.8%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 311 (69.7%) zeros SibSp has 308 (69.1%) zeros Zeros
Parch has 342 (76.7%) zeros Parch has 345 (77.4%) zeros Zeros
Fare has 11 (2.5%) zeros Fare has 9 (2.0%) zeros Zeros
Alert not present in this datasetSex is highly overall correlated with SurvivedHigh correlation
Alert not present in this datasetSurvived is highly overall correlated with SexHigh correlation

Reproduction

 Dataset ADataset B
Analysis started2025-03-11 15:19:02.6120772025-03-11 15:19:05.085419
Analysis finished2025-03-11 15:19:05.0822272025-03-11 15:19:07.626429
Duration2.47 seconds2.54 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean433446.37444
 Dataset ADataset B
Minimum13
Maximum884891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-11T15:19:07.744863image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum13
5-th percentile52.541.25
Q1216.25232.75
median427439.5
Q3647.25671.75
95-th percentile832.75851.5
Maximum884891
Range883888
Interquartile range (IQR)431439

Descriptive statistics

 Dataset ADataset B
Standard deviation252.47345257.66194
Coefficient of variation (CV)0.583079570.57723275
Kurtosis-1.1647224-1.194053
Mean433446.37444
Median Absolute Deviation (MAD)214.5221
Skewness0.0602438970.049195146
Sum193118199083
Variance63742.84566389.677
MonotonicityNot monotonicNot monotonic
2025-03-11T15:19:07.912678image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
401 1
 
0.2%
561 1
 
0.2%
770 1
 
0.2%
373 1
 
0.2%
307 1
 
0.2%
76 1
 
0.2%
4 1
 
0.2%
791 1
 
0.2%
211 1
 
0.2%
845 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
56 1
 
0.2%
37 1
 
0.2%
767 1
 
0.2%
525 1
 
0.2%
684 1
 
0.2%
875 1
 
0.2%
723 1
 
0.2%
470 1
 
0.2%
189 1
 
0.2%
444 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
4 1
0.2%
5 1
0.2%
7 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
12 1
0.2%
17 1
0.2%
18 1
0.2%
ValueCountFrequency (%)
3 1
0.2%
4 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
15 1
0.2%
17 1
0.2%
ValueCountFrequency (%)
3 1
0.2%
4 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
15 1
0.2%
17 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
4 1
0.2%
5 1
0.2%
7 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
12 1
0.2%
17 1
0.2%
18 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
278 
1
168 
0
278 
1
168 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row01
2nd row00
3rd row00
4th row10
5th row01

Common Values

ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Length

2025-03-11T15:19:08.027898image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-11T15:19:08.082288image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:08.119568image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring characters

ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
237 
1
108 
2
101 
3
246 
1
114 
2
86 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row33
2nd row31
3rd row33
4th row13
5th row32

Common Values

ValueCountFrequency (%)
3 237
53.1%
1 108
24.2%
2 101
22.6%
ValueCountFrequency (%)
3 246
55.2%
1 114
25.6%
2 86
 
19.3%

Length

2025-03-11T15:19:08.181688image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-11T15:19:08.238071image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:08.285187image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
3 237
53.1%
1 108
24.2%
2 101
22.6%
ValueCountFrequency (%)
3 246
55.2%
1 114
25.6%
2 86
 
19.3%

Most occurring characters

ValueCountFrequency (%)
3 237
53.1%
1 108
24.2%
2 101
22.6%
ValueCountFrequency (%)
3 246
55.2%
1 114
25.6%
2 86
 
19.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 237
53.1%
1 108
24.2%
2 101
22.6%
ValueCountFrequency (%)
3 246
55.2%
1 114
25.6%
2 86
 
19.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 237
53.1%
1 108
24.2%
2 101
22.6%
ValueCountFrequency (%)
3 246
55.2%
1 114
25.6%
2 86
 
19.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 237
53.1%
1 108
24.2%
2 101
22.6%
ValueCountFrequency (%)
3 246
55.2%
1 114
25.6%
2 86
 
19.3%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-11T15:19:08.627993image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8261
Median length5046
Mean length26.74215226.596413
Min length1312

Characters and Unicode

 Dataset ADataset B
Total characters1192711862
Distinct characters6060
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowMorrow, Mr. Thomas RowanMamee, Mr. Hanna
2nd rowGronnestad, Mr. Daniel DanielsenBrewe, Dr. Arthur Jackson
3rd rowBeavan, Mr. William ThomasKassem, Mr. Fared
4th rowFleming, Miss. MargaretGoodwin, Mr. Charles Edward
5th rowMoen, Mr. Sigurd HansenAbelson, Mrs. Samuel (Hannah Wizosky)
ValueCountFrequency (%)
mr 268
 
14.8%
miss 96
 
5.3%
mrs 62
 
3.4%
william 34
 
1.9%
john 18
 
1.0%
henry 17
 
0.9%
james 16
 
0.9%
master 16
 
0.9%
george 14
 
0.8%
charles 12
 
0.7%
Other values (872) 1255
69.4%
ValueCountFrequency (%)
mr 269
 
15.0%
miss 95
 
5.3%
mrs 54
 
3.0%
william 36
 
2.0%
henry 18
 
1.0%
john 17
 
0.9%
master 17
 
0.9%
george 16
 
0.9%
james 13
 
0.7%
charles 13
 
0.7%
Other values (899) 1244
69.4%
2025-03-11T15:19:09.154381image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1363
 
11.4%
r 971
 
8.1%
e 877
 
7.4%
a 823
 
6.9%
i 673
 
5.6%
s 630
 
5.3%
n 611
 
5.1%
M 565
 
4.7%
l 543
 
4.6%
o 462
 
3.9%
Other values (50) 4409
37.0%
ValueCountFrequency (%)
1346
 
11.3%
r 949
 
8.0%
e 851
 
7.2%
a 788
 
6.6%
i 678
 
5.7%
s 659
 
5.6%
n 632
 
5.3%
M 541
 
4.6%
l 532
 
4.5%
o 508
 
4.3%
Other values (50) 4378
36.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11927
100.0%
ValueCountFrequency (%)
(unknown) 11862
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1363
 
11.4%
r 971
 
8.1%
e 877
 
7.4%
a 823
 
6.9%
i 673
 
5.6%
s 630
 
5.3%
n 611
 
5.1%
M 565
 
4.7%
l 543
 
4.6%
o 462
 
3.9%
Other values (50) 4409
37.0%
ValueCountFrequency (%)
1346
 
11.3%
r 949
 
8.0%
e 851
 
7.2%
a 788
 
6.6%
i 678
 
5.7%
s 659
 
5.6%
n 632
 
5.3%
M 541
 
4.6%
l 532
 
4.5%
o 508
 
4.3%
Other values (50) 4378
36.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11927
100.0%
ValueCountFrequency (%)
(unknown) 11862
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1363
 
11.4%
r 971
 
8.1%
e 877
 
7.4%
a 823
 
6.9%
i 673
 
5.6%
s 630
 
5.3%
n 611
 
5.1%
M 565
 
4.7%
l 543
 
4.6%
o 462
 
3.9%
Other values (50) 4409
37.0%
ValueCountFrequency (%)
1346
 
11.3%
r 949
 
8.0%
e 851
 
7.2%
a 788
 
6.6%
i 678
 
5.7%
s 659
 
5.6%
n 632
 
5.3%
M 541
 
4.6%
l 532
 
4.5%
o 508
 
4.3%
Other values (50) 4378
36.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11927
100.0%
ValueCountFrequency (%)
(unknown) 11862
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1363
 
11.4%
r 971
 
8.1%
e 877
 
7.4%
a 823
 
6.9%
i 673
 
5.6%
s 630
 
5.3%
n 611
 
5.1%
M 565
 
4.7%
l 543
 
4.6%
o 462
 
3.9%
Other values (50) 4409
37.0%
ValueCountFrequency (%)
1346
 
11.3%
r 949
 
8.0%
e 851
 
7.2%
a 788
 
6.6%
i 678
 
5.7%
s 659
 
5.6%
n 632
 
5.3%
M 541
 
4.6%
l 532
 
4.5%
o 508
 
4.3%
Other values (50) 4378
36.9%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
288 
female
158 
male
294 
female
152 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.70852024.6816143
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21002088
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowmalemale
3rd rowmalemale
4th rowfemalemale
5th rowmalefemale

Common Values

ValueCountFrequency (%)
male 288
64.6%
female 158
35.4%
ValueCountFrequency (%)
male 294
65.9%
female 152
34.1%

Length

2025-03-11T15:19:09.252874image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-11T15:19:09.311978image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:09.348088image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male 288
64.6%
female 158
35.4%
ValueCountFrequency (%)
male 294
65.9%
female 152
34.1%

Most occurring characters

ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%
ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2100
100.0%
ValueCountFrequency (%)
(unknown) 2088
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%
ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2100
100.0%
ValueCountFrequency (%)
(unknown) 2088
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%
ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2100
100.0%
ValueCountFrequency (%)
(unknown) 2088
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%
ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7475
Distinct (%)20.7%20.7%
Missing8883
Missing (%)19.7%18.6%
Infinite00
Infinite (%)0.0%0.0%
Mean29.75069829.955923
 Dataset ADataset B
Minimum0.420.67
Maximum8080
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-11T15:19:09.457304image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.67
5-th percentile4.854
Q12121
median2928
Q33838
95-th percentile5457.8
Maximum8080
Range79.5879.33
Interquartile range (IQR)1717

Descriptive statistics

 Dataset ADataset B
Standard deviation14.07485414.624023
Coefficient of variation (CV)0.473093230.48818468
Kurtosis0.111180630.42182478
Mean29.75069829.955923
Median Absolute Deviation (MAD)8.258
Skewness0.279590620.46640687
Sum10650.7510874
Variance198.10151213.86204
MonotonicityNot monotonicNot monotonic
2025-03-11T15:19:09.735053image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24 16
 
3.6%
22 15
 
3.4%
36 14
 
3.1%
18 14
 
3.1%
30 13
 
2.9%
35 13
 
2.9%
21 12
 
2.7%
31 12
 
2.7%
28 11
 
2.5%
34 10
 
2.2%
Other values (64) 228
51.1%
(Missing) 88
 
19.7%
ValueCountFrequency (%)
24 20
 
4.5%
18 15
 
3.4%
28 13
 
2.9%
36 13
 
2.9%
30 12
 
2.7%
25 12
 
2.7%
26 11
 
2.5%
20 11
 
2.5%
21 11
 
2.5%
40 10
 
2.2%
Other values (65) 235
52.7%
(Missing) 83
 
18.6%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
1 2
 
0.4%
2 4
0.9%
3 3
0.7%
4 5
1.1%
5 2
 
0.4%
6 3
0.7%
7 2
 
0.4%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
1 2
 
0.4%
2 7
1.6%
3 2
 
0.4%
4 6
1.3%
6 3
0.7%
7 1
 
0.2%
8 1
 
0.2%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
1 2
 
0.4%
2 7
1.6%
3 2
 
0.4%
4 6
1.3%
6 3
0.7%
7 1
 
0.2%
8 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
1 2
 
0.4%
2 4
0.9%
3 3
0.7%
4 5
1.1%
5 2
 
0.4%
6 3
0.7%
7 2
 
0.4%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.513452910.53363229
 Dataset ADataset B
Minimum00
Maximum88
Zeros311308
Zeros (%)69.7%69.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-11T15:19:09.840827image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile22.75
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.14697381.1775002
Coefficient of variation (CV)2.23384422.206576
Kurtosis20.4321518.648508
Mean0.513452910.53363229
Median Absolute Deviation (MAD)00
Skewness4.01058253.8637579
Sum229238
Variance1.31554891.3865068
MonotonicityNot monotonicNot monotonic
2025-03-11T15:19:09.918456image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 311
69.7%
1 97
 
21.7%
2 17
 
3.8%
3 8
 
1.8%
4 6
 
1.3%
8 5
 
1.1%
5 2
 
0.4%
ValueCountFrequency (%)
0 308
69.1%
1 99
 
22.2%
2 16
 
3.6%
3 9
 
2.0%
8 5
 
1.1%
4 5
 
1.1%
5 4
 
0.9%
ValueCountFrequency (%)
0 311
69.7%
1 97
 
21.7%
2 17
 
3.8%
3 8
 
1.8%
4 6
 
1.3%
5 2
 
0.4%
8 5
 
1.1%
ValueCountFrequency (%)
0 308
69.1%
1 99
 
22.2%
2 16
 
3.6%
3 9
 
2.0%
4 5
 
1.1%
5 4
 
0.9%
8 5
 
1.1%
ValueCountFrequency (%)
0 308
69.1%
1 99
 
22.2%
2 16
 
3.6%
3 9
 
2.0%
4 5
 
1.1%
5 4
 
0.9%
8 5
 
1.1%
ValueCountFrequency (%)
0 311
69.7%
1 97
 
21.7%
2 17
 
3.8%
3 8
 
1.8%
4 6
 
1.3%
5 2
 
0.4%
8 5
 
1.1%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct76
Distinct (%)1.6%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.374439460.35201794
 Dataset ADataset B
Minimum00
Maximum65
Zeros342345
Zeros (%)76.7%77.4%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-11T15:19:09.989546image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum65
Range65
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.805136060.74612585
Coefficient of variation (CV)2.15024362.1195677
Kurtosis10.8973718.1967426
Mean0.374439460.35201794
Median Absolute Deviation (MAD)00
Skewness2.85251292.5537868
Sum167157
Variance0.648244070.55670378
MonotonicityNot monotonicNot monotonic
2025-03-11T15:19:10.066153image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 342
76.7%
1 57
 
12.8%
2 39
 
8.7%
3 4
 
0.9%
5 2
 
0.4%
6 1
 
0.2%
4 1
 
0.2%
ValueCountFrequency (%)
0 345
77.4%
1 55
 
12.3%
2 41
 
9.2%
5 2
 
0.4%
3 2
 
0.4%
4 1
 
0.2%
ValueCountFrequency (%)
0 342
76.7%
1 57
 
12.8%
2 39
 
8.7%
3 4
 
0.9%
4 1
 
0.2%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 345
77.4%
1 55
 
12.3%
2 41
 
9.2%
3 2
 
0.4%
4 1
 
0.2%
5 2
 
0.4%
ValueCountFrequency (%)
0 345
77.4%
1 55
 
12.3%
2 41
 
9.2%
3 2
 
0.4%
4 1
 
0.2%
5 2
 
0.4%
ValueCountFrequency (%)
0 342
76.7%
1 57
 
12.8%
2 39
 
8.7%
3 4
 
0.9%
4 1
 
0.2%
5 2
 
0.4%
6 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct382389
Distinct (%)85.7%87.2%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-11T15:19:10.511010image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length7.05381176.6793722
Min length34

Characters and Unicode

 Dataset ADataset B
Total characters31462979
Distinct characters3227
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique331344 ?
Unique (%)74.2%77.1%

Sample

 Dataset ADataset B
1st row3726222677
2nd row8471112379
3rd row3239512700
4th row17421CA 2144
5th row348123P/PP 3381
ValueCountFrequency (%)
pc 30
 
5.1%
c.a 16
 
2.7%
a/5 11
 
1.9%
ca 8
 
1.4%
soton/o.q 8
 
1.4%
2 7
 
1.2%
ston/o 7
 
1.2%
2343 5
 
0.9%
soton/oq 5
 
0.9%
w./c 4
 
0.7%
Other values (403) 482
82.7%
ValueCountFrequency (%)
pc 32
 
5.7%
c.a 11
 
1.9%
ca 10
 
1.8%
a/5 9
 
1.6%
soton/o.q 5
 
0.9%
w./c 5
 
0.9%
2343 5
 
0.9%
ston/o 4
 
0.7%
2 4
 
0.7%
a/4 4
 
0.7%
Other values (408) 476
84.2%
2025-03-11T15:19:11.072709image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 384
12.2%
1 348
11.1%
2 304
9.7%
7 235
 
7.5%
6 211
 
6.7%
0 210
 
6.7%
4 210
 
6.7%
5 203
 
6.5%
9 159
 
5.1%
8 145
 
4.6%
Other values (22) 737
23.4%
ValueCountFrequency (%)
3 359
12.1%
1 354
11.9%
2 271
9.1%
7 255
8.6%
4 245
8.2%
0 215
 
7.2%
6 202
 
6.8%
5 193
 
6.5%
9 158
 
5.3%
8 142
 
4.8%
Other values (17) 585
19.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3146
100.0%
ValueCountFrequency (%)
(unknown) 2979
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 384
12.2%
1 348
11.1%
2 304
9.7%
7 235
 
7.5%
6 211
 
6.7%
0 210
 
6.7%
4 210
 
6.7%
5 203
 
6.5%
9 159
 
5.1%
8 145
 
4.6%
Other values (22) 737
23.4%
ValueCountFrequency (%)
3 359
12.1%
1 354
11.9%
2 271
9.1%
7 255
8.6%
4 245
8.2%
0 215
 
7.2%
6 202
 
6.8%
5 193
 
6.5%
9 158
 
5.3%
8 142
 
4.8%
Other values (17) 585
19.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3146
100.0%
ValueCountFrequency (%)
(unknown) 2979
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 384
12.2%
1 348
11.1%
2 304
9.7%
7 235
 
7.5%
6 211
 
6.7%
0 210
 
6.7%
4 210
 
6.7%
5 203
 
6.5%
9 159
 
5.1%
8 145
 
4.6%
Other values (22) 737
23.4%
ValueCountFrequency (%)
3 359
12.1%
1 354
11.9%
2 271
9.1%
7 255
8.6%
4 245
8.2%
0 215
 
7.2%
6 202
 
6.8%
5 193
 
6.5%
9 158
 
5.3%
8 142
 
4.8%
Other values (17) 585
19.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3146
100.0%
ValueCountFrequency (%)
(unknown) 2979
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 384
12.2%
1 348
11.1%
2 304
9.7%
7 235
 
7.5%
6 211
 
6.7%
0 210
 
6.7%
4 210
 
6.7%
5 203
 
6.5%
9 159
 
5.1%
8 145
 
4.6%
Other values (22) 737
23.4%
ValueCountFrequency (%)
3 359
12.1%
1 354
11.9%
2 271
9.1%
7 255
8.6%
4 245
8.2%
0 215
 
7.2%
6 202
 
6.8%
5 193
 
6.5%
9 158
 
5.3%
8 142
 
4.8%
Other values (17) 585
19.6%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct170173
Distinct (%)38.1%38.8%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean29.86954434.032454
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros119
Zeros (%)2.5%2.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-11T15:19:11.210179image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.057.162525
Q17.9257.8958
median14.454214.4542
Q329.731.275
95-th percentile108.28125130.2375
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)21.77523.3792

Descriptive statistics

 Dataset ADataset B
Standard deviation44.32354154.25907
Coefficient of variation (CV)1.48390421.5943331
Kurtosis37.5862330.286601
Mean29.86954434.032454
Median Absolute Deviation (MAD)7.20427.2042
Skewness4.91182064.6439131
Sum13321.81715178.475
Variance1964.57632944.0467
MonotonicityNot monotonicNot monotonic
2025-03-11T15:19:11.373906image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 25
 
5.6%
13 23
 
5.2%
26 18
 
4.0%
7.75 16
 
3.6%
10.5 12
 
2.7%
0 11
 
2.5%
7.925 10
 
2.2%
7.8958 9
 
2.0%
7.8542 8
 
1.8%
8.6625 8
 
1.8%
Other values (160) 306
68.6%
ValueCountFrequency (%)
13 23
 
5.2%
8.05 21
 
4.7%
7.75 18
 
4.0%
26 18
 
4.0%
7.8958 17
 
3.8%
10.5 11
 
2.5%
26.55 10
 
2.2%
8.6625 10
 
2.2%
0 9
 
2.0%
7.775 9
 
2.0%
Other values (163) 300
67.3%
ValueCountFrequency (%)
0 11
2.5%
6.4958 2
 
0.4%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 6
1.3%
7.0542 2
 
0.4%
7.125 4
 
0.9%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 9
2.0%
6.4375 1
 
0.2%
6.4958 2
 
0.4%
6.75 2
 
0.4%
6.975 1
 
0.2%
7.05 5
1.1%
7.0542 1
 
0.2%
7.125 1
 
0.2%
7.1417 1
 
0.2%
7.225 8
1.8%
ValueCountFrequency (%)
0 9
2.0%
6.4375 1
 
0.2%
6.4958 2
 
0.4%
6.75 2
 
0.4%
6.975 1
 
0.2%
7.05 5
1.1%
7.0542 1
 
0.2%
7.125 1
 
0.2%
7.1417 1
 
0.2%
7.225 8
1.8%
ValueCountFrequency (%)
0 11
2.5%
6.4958 2
 
0.4%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 6
1.3%
7.0542 2
 
0.4%
7.125 4
 
0.9%
7.1417 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8895
Distinct (%)81.5%88.0%
Missing338338
Missing (%)75.8%75.8%
Memory size7.0 KiB7.0 KiB
2025-03-11T15:19:11.777292image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1115
Median length33
Mean length3.50925933.6111111
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters379390
Distinct characters1919
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique6985 ?
Unique (%)63.9%78.7%

Sample

 Dataset ADataset B
1st rowF G73E49
2nd rowC123C103
3rd rowF G73B94
4th rowE63C123
5th rowD6F G73
ValueCountFrequency (%)
b96 3
 
2.4%
b98 3
 
2.4%
f 3
 
2.4%
g73 2
 
1.6%
c22 2
 
1.6%
c26 2
 
1.6%
c123 2
 
1.6%
b58 2
 
1.6%
b60 2
 
1.6%
e24 2
 
1.6%
Other values (85) 100
81.3%
ValueCountFrequency (%)
g6 3
 
2.4%
c23 3
 
2.4%
c25 3
 
2.4%
c27 3
 
2.4%
b96 3
 
2.4%
b98 3
 
2.4%
e67 2
 
1.6%
f2 2
 
1.6%
e8 2
 
1.6%
c123 2
 
1.6%
Other values (94) 101
79.5%
2025-03-11T15:19:12.244721image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 40
 
10.6%
C 37
 
9.8%
1 37
 
9.8%
B 31
 
8.2%
6 27
 
7.1%
3 27
 
7.1%
E 21
 
5.5%
8 21
 
5.5%
9 19
 
5.0%
7 18
 
4.7%
Other values (9) 101
26.6%
ValueCountFrequency (%)
C 41
 
10.5%
2 38
 
9.7%
B 35
 
9.0%
3 32
 
8.2%
6 30
 
7.7%
1 29
 
7.4%
9 22
 
5.6%
8 21
 
5.4%
5 20
 
5.1%
7 19
 
4.9%
Other values (9) 103
26.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 379
100.0%
ValueCountFrequency (%)
(unknown) 390
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 40
 
10.6%
C 37
 
9.8%
1 37
 
9.8%
B 31
 
8.2%
6 27
 
7.1%
3 27
 
7.1%
E 21
 
5.5%
8 21
 
5.5%
9 19
 
5.0%
7 18
 
4.7%
Other values (9) 101
26.6%
ValueCountFrequency (%)
C 41
 
10.5%
2 38
 
9.7%
B 35
 
9.0%
3 32
 
8.2%
6 30
 
7.7%
1 29
 
7.4%
9 22
 
5.6%
8 21
 
5.4%
5 20
 
5.1%
7 19
 
4.9%
Other values (9) 103
26.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 379
100.0%
ValueCountFrequency (%)
(unknown) 390
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 40
 
10.6%
C 37
 
9.8%
1 37
 
9.8%
B 31
 
8.2%
6 27
 
7.1%
3 27
 
7.1%
E 21
 
5.5%
8 21
 
5.5%
9 19
 
5.0%
7 18
 
4.7%
Other values (9) 101
26.6%
ValueCountFrequency (%)
C 41
 
10.5%
2 38
 
9.7%
B 35
 
9.0%
3 32
 
8.2%
6 30
 
7.7%
1 29
 
7.4%
9 22
 
5.6%
8 21
 
5.4%
5 20
 
5.1%
7 19
 
4.9%
Other values (9) 103
26.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 379
100.0%
ValueCountFrequency (%)
(unknown) 390
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 40
 
10.6%
C 37
 
9.8%
1 37
 
9.8%
B 31
 
8.2%
6 27
 
7.1%
3 27
 
7.1%
E 21
 
5.5%
8 21
 
5.5%
9 19
 
5.0%
7 18
 
4.7%
Other values (9) 101
26.6%
ValueCountFrequency (%)
C 41
 
10.5%
2 38
 
9.7%
B 35
 
9.0%
3 32
 
8.2%
6 30
 
7.7%
1 29
 
7.4%
9 22
 
5.6%
8 21
 
5.4%
5 20
 
5.1%
7 19
 
4.9%
Other values (9) 103
26.4%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing11
Missing (%)0.2%0.2%
Memory size7.0 KiB7.0 KiB
S
333 
C
73 
Q
39 
S
332 
C
83 
Q
 
30

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowQC
2nd rowSC
3rd rowSC
4th rowCS
5th rowSC

Common Values

ValueCountFrequency (%)
S 333
74.7%
C 73
 
16.4%
Q 39
 
8.7%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 332
74.4%
C 83
 
18.6%
Q 30
 
6.7%
(Missing) 1
 
0.2%

Length

2025-03-11T15:19:12.332111image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-11T15:19:12.385897image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:12.431974image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
s 333
74.8%
c 73
 
16.4%
q 39
 
8.8%
ValueCountFrequency (%)
s 332
74.6%
c 83
 
18.7%
q 30
 
6.7%

Most occurring characters

ValueCountFrequency (%)
S 333
74.8%
C 73
 
16.4%
Q 39
 
8.8%
ValueCountFrequency (%)
S 332
74.6%
C 83
 
18.7%
Q 30
 
6.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 333
74.8%
C 73
 
16.4%
Q 39
 
8.8%
ValueCountFrequency (%)
S 332
74.6%
C 83
 
18.7%
Q 30
 
6.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 333
74.8%
C 73
 
16.4%
Q 39
 
8.8%
ValueCountFrequency (%)
S 332
74.6%
C 83
 
18.7%
Q 30
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 333
74.8%
C 73
 
16.4%
Q 39
 
8.8%
ValueCountFrequency (%)
S 332
74.6%
C 83
 
18.7%
Q 30
 
6.7%

Interactions

Dataset A

2025-03-11T15:19:04.441076image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:06.869403image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:02.877282image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:05.445357image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:03.315245image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:05.784584image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:03.691108image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:06.153750image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:04.072660image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:06.524520image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:04.510556image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:06.937123image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:02.945369image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:05.509314image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:03.388319image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:05.855684image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:03.765071image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:06.225531image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:04.143348image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:06.590171image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:04.583349image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:07.010127image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:03.019959image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:05.579203image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:03.467401image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:05.934220image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:03.840383image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:06.297946image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:04.219580image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:06.662916image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:04.659436image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:07.084710image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:03.095849image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:05.651573image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:03.539917image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:06.006780image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:03.919497image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:06.377380image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:04.296914image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:06.735539image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:04.730094image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:07.153726image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:03.248096image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:05.717103image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:03.616476image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:06.079295image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:03.996641image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:06.449684image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-11T15:19:04.370281image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:06.801934image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

Dataset A

2025-03-11T15:19:12.484682image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-11T15:19:12.598055image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.1090.160-0.232-0.0010.2690.149-0.1920.239
Embarked0.1091.0000.1970.0000.0000.2440.1230.1380.177
Fare0.1600.1971.0000.406-0.0530.4490.1460.4100.167
Parch-0.2320.0000.4061.0000.0210.0000.3000.4390.151
PassengerId-0.0010.000-0.0530.0211.0000.0000.050-0.0650.071
Pclass0.2690.2440.4490.0000.0001.0000.0360.1130.317
Sex0.1490.1230.1460.3000.0500.0361.0000.2260.492
SibSp-0.1920.1380.4100.439-0.0650.1130.2261.0000.138
Survived0.2390.1770.1670.1510.0710.3170.4920.1381.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0000.183-0.258-0.0540.3070.161-0.1230.215
Embarked0.0001.0000.2090.0000.0000.2570.0830.0000.152
Fare0.1830.2091.0000.424-0.0530.4800.1870.4660.254
Parch-0.2580.0000.4241.0000.0350.0000.2870.4720.182
PassengerId-0.0540.000-0.0530.0351.0000.0000.076-0.0000.065
Pclass0.3070.2570.4800.0000.0001.0000.1140.1630.303
Sex0.1610.0830.1870.2870.0760.1141.0000.1640.538
SibSp-0.1230.0000.4660.472-0.0000.1630.1641.0000.203
Survived0.2150.1520.2540.1820.0650.3030.5380.2031.000

Missing values

Dataset A

2025-03-11T15:19:04.844450image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2025-03-11T15:19:07.264664image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2025-03-11T15:19:04.936351image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2025-03-11T15:19:07.356799image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2025-03-11T15:19:05.034603image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2025-03-11T15:19:07.579818image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
56056103Morrow, Mr. Thomas RowanmaleNaN003726227.7500NaNQ
76977003Gronnestad, Mr. Daniel Danielsenmale32.00084718.3625NaNS
37237303Beavan, Mr. William Thomasmale19.0003239518.0500NaNS
30630711Fleming, Miss. MargaretfemaleNaN0017421110.8833NaNC
757603Moen, Mr. Sigurd Hansenmale25.0003481237.6500F G73S
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
79079103Keane, Mr. Andrew "Andy"maleNaN00124607.7500NaNQ
21021103Ali, Mr. Ahmedmale24.000SOTON/O.Q. 31013117.0500NaNS
84484503Culumovic, Mr. Jesomale17.0003150908.6625NaNS
34834913Coutts, Master. William Loch "William"male3.011C.A. 3767115.9000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
363713Mamee, Mr. HannamaleNaN0026777.2292NaNC
76676701Brewe, Dr. Arthur JacksonmaleNaN0011237939.6000NaNC
52452503Kassem, Mr. FaredmaleNaN0027007.2292NaNC
68368403Goodwin, Mr. Charles Edwardmale14.0052CA 214446.9000NaNS
87487512Abelson, Mrs. Samuel (Hannah Wizosky)female28.0010P/PP 338124.0000NaNC
72272302Gillespie, Mr. William Henrymale34.00001223313.0000NaNS
46947013Baclini, Miss. Helene Barbarafemale0.7521266619.2583NaNC
18818903Bourke, Mr. Johnmale40.001136484915.5000NaNQ
44344412Reynaldo, Ms. Encarnacionfemale28.000023043413.0000NaNS
53553612Hart, Miss. Eva Miriamfemale7.0002F.C.C. 1352926.2500NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
27727802Parkes, Mr. Francis "Frank"maleNaN002398530.0000NaNS
47847903Karlsson, Mr. Nils Augustmale22.0003500607.5208NaNS
78478503Ali, Mr. Williammale25.000SOTON/O.Q. 31013127.0500NaNS
46546603Goncalves, Mr. Manuel Estanslasmale38.000SOTON/O.Q. 31013067.0500NaNS
80680701Andrews, Mr. Thomas Jrmale39.0001120500.0000A36S
80280311Carter, Master. William Thornton IImale11.012113760120.0000B96 B98S
88088112Shelley, Mrs. William (Imanita Parrish Hall)female25.00123043326.0000NaNS
70270303Barbara, Miss. Saiidefemale18.001269114.4542NaNC
16917003Ling, Mr. Leemale28.000160156.4958NaNS
40040113Niskanen, Mr. Juhamale39.000STON/O 2. 31012897.9250NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
85585613Aks, Mrs. Sam (Leah Rosen)female18.0013920919.3500NaNS
56556603Davies, Mr. Alfred Jmale24.020A/4 4887124.1500NaNS
27627703Lindblom, Miss. Augusta Charlottafemale45.0003470737.7500NaNS
13113203Coelho, Mr. Domingos Fernandeomale20.000SOTON/O.Q. 31013077.0500NaNS
78078113Ayoub, Miss. Banourafemale13.00026877.2292NaNC
37637713Landergren, Miss. Aurora Adeliafemale22.000C 70777.2500NaNS
66666702Butler, Mr. Reginald Fentonmale25.00023468613.0000NaNS
35535603Vanden Steen, Mr. Leo Petermale28.0003457839.5000NaNS
39439513Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengtsson)female24.002PP 954916.7000G6S
555611Woolner, Mr. HughmaleNaN001994735.5000C52S

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.